Design of a SIMO Deep Learning-Based Chaos Shift Keying (DLCSK) Communication System

This paper brings forward a Deep Learning (DL)-based Chaos Shift Keying (DLCSK) demodulation scheme to promote the capabilities of existing chaos-based wireless communication systems. In coherent Chaos Shift Keying (CSK) schemes, we need synchronization of chaotic sequences, which is still practically impossible in a disturbing environment. Moreover, the conventional Differential Chaos Shift Keying (DCSK) scheme has a drawback, that for each bit, half of the bit duration is spent sending non-information bearing reference samples. To deal with this drawback, a Long Short-Term Memory (LSTM)-based receiver is trained offline, using chaotic maps through a finite number of channel realizations, and then used for classifying online modulated signals. We presented that the proposed receiver can learn different chaotic maps and estimate channels implicitly, and then retrieves the transmitted messages without any need for chaos synchronization or reference signal transmissions. Simulation results for both the AWGN and Rayleigh fading channels show a remarkable BER performance improvement compared to the conventional DCSK scheme. The proposed DLCSK system will provide opportunities for a new class of receivers by leveraging the advantages of DL, such as effective serial and parallel connectivity. A Single Input Multiple Output (SIMO) architecture of the DLCSK receiver with excellent reliability is introduced to show its capabilities. The SIMO DLCSK benefits from a DL-based channel estimation approach, which makes this architecture simpler and more efficient for applications where channel estimation is problematic, such as massive MIMO, mmWave, and cloud-based communication systems.


Introduction
Chaotic signals are wide-band noise-like signals with robust and reproducible statistical features [1,2]. Thus, digital modulation using chaotic signals presents an inherently simple solution for robust and secure communications over multi-path fading channels [3,4]. Chaos-based modulations are generally classified into two main classes [5]. The first is coherent detection schemes, such as the Chaos Shift Keying (CSK) scheme [6], in which data are transmitted in a combination of basis functions obtained from chaotic waveforms. Chaotic synchronization is commonly used in the recovery of basis functions on the receiver side [7][8][9]. Theoretically, CSK modulation can achieve the Bit Error Rate (BER) performance of Binary Phase Shift Keying (BPSK) [10]. In application, this performance is not achievable as some problems needs to be solved, namely the recovery of basis functions and the estimation [11][12][13][14][15][16]. Since chaotic synchronization is still practically impossible in a turbulent environment [17], coherent schemes can not work properly in utilization.
The Differential Chaos Shift Keying (DCSK) system [18] is a variant of CSK where the basis functions have a special arrangement and the information can be revealed from the correlation between the parts of the basis functions. In DCSK modulation, each bit duration

Background
DL is a powerful tool that can be implemented for solving problems that are complicated to describe through mathematical models [27]. There are two general methods for designing DL-based architectures, namely DL-based receiver design and joint DL-based transceiver design [28]. A DL-based joint transceiver design optimizes the system as an end-to-end auto-encoder [29]. In contrast, a DL-based receiver optimizes one or more blocks in the receiver [30]. As an example of the latter, in [31], the authors suggested a DL-based receiver to indirectly estimate wireless channels and retrieve the signals. They demonstrated that DL-based receivers have the ability to learn the features of wireless channels, including non-linear distortions. Practical wireless channels change over time with a large dynamic range. In this case, NNs can hardly perform well in the detection tasks under different channel coefficients [32]. To overcome this challenge, Deep Transfer Learning (DTL) adopts a NN to extract the time-varying features with a few online training data by transferring knowledge from a source domain to a target domain [33][34][35]. Motivated by this approach, in this paper, we try to propose a framework for detection of the chaotic-modulated signals.
In this study, our focus is on the design of a DL-based receiver that can be easily harmonized with existing CSK transmitters. Recently, DL have attracted a lot of attention because of its effectiveness in data-driven analysis of chaotic dynamics [36]. Some studies have used DL in chaos-based communications [37][38][39][40], however, their methods need further explorations. In [39], the authors proposed an algorithm to demodulate the reference chaotic signals iteratively. In contrast to traditional demodulation methods, utilizing time or frequency resources to improve reliability, the iterative receiver addresses the feature extraction capability of neural networks (NNs). In [40], an intelligent OFDM-DCSK demodulator is proposed using an LSTM-aided NN that withdraw the correlations between chaotic modulated OFDM-DCSK signals in order to retrieve the transmitted information. However, the LSTM-aided receiver transmits reference signals that do not carry useful information. A similar problem can be observed in the other proposed DLbased modulations (e.g., in [38,39]).
In this paper, we present an innovative Long Short-Term Memory (LSTM)-based receiver that does not require any reference signal transmissions or chaotic synchronization circuits for data detection. The proposed DL-based Chaos Shift Keying (DLCSK) system benefits from other advantages of NNs, such as generalization and fast training. The multi-antenna design of the DLCSK receiver that, uses an LSTM for each antenna is also presented. Many articles have also combined Multi-Antenna technology with Chaos-based communications [41][42][43][44][45][46][47][48][49][50][51][52][53]. For instance, a MIMO STBC-DCSK system does not require any complicated channel estimation, carrier synchronization, or rake reception [54]. However, these conventional techniques transmit the reference signals over the channel, which increases the redundancy and complexity of the system. Previous research in the context of neural network ensembles shows that combining network architectures is frequently more accurate than using single networks [55][56][57]. The proposed Single Input Multi Output (SIMO) DLCSK focuses on a fusing method to achieve a diversity gain, where the outputs of all LSTMs are combined using a majority voting strategy. This design maintains the advantages of traditional Multi-Antenna and Chaos-based systems, i.e., it does not require rake receivers, CSI, and chaotic synchronization. DLCSK is an appealing candidate for secure data transmission in cloud-based systems [58][59][60], vehicular communications [61][62][63], and massive MIMO systems [64][65][66][67] due to the aforementioned characteristics.

Contributions
The objective of this work is to develop a DL-based receiver that benefits from the inherent security of chaotic signals and the merits of NNs. This design will robust the capabilities of the existing chaos-based schemes, such as reliability and energy efficiency. Innovative aspects of this paper are briefly defined below:

•
We train a LSTM-based classifier that enables online classification of the received chaotic signals. Implementing this method can mitigate the chaotic synchronization problem in the existing CSK receivers; • The DLCSK modulation/demodulation scheme does not need any reference transmission for basis functions recovery, unlike reference-based non-coherent modulations. According to the above advantages, the BER performance of a single antenna DLCSK is close to the performance of the antipodal CSK under a Rayleigh fading channel that shows an outstanding BER performance among all chaotic modulations [10]. • We have proposed a Multi-Antenna architecture of the DLCSK system. Multiple LSTMs are used at the receiver end to obtain a diversity gain. We numerically simulate the SIMO DLCSK structure and state the advantages of fusing the hard outputs of the LSTMs to come to a decision.
The rest of this paper is organized in following sections: In Section 2, a statistical study on the existing CSK systems and correlation receivers is presented; In Section 3, the structure of the proposed SIMO DLCSK system and the basics of the LSTM-based classifier are described; In Section 4, simulation results and discussions are explained; In Section 5, the conclusions are explained.

Traditional Correlation Receivers
This section presents a statistical study on a typical CSK system equipped with the correlation receivers for a deeper identification of its unsolved problems. In contrast with regular communications systems in which the basis functions are periodic and orthogonal (e.g., sinusoidal functions), in a chaotic communications system, the basis functions are not orthogonal necessarily and differ from symbol to another. Consider now a coherent CSK system using two basis functions [10]. The elements of the signal set are given by where b is the index of the current transmission bit, the weights s b1 and s b2 are the elements of the signal vector, and the basis functionsx andx are discrete-time chaotic signals with β samples, i.e.,x = {x 1 , . . . ,x β } andx= {x 1 , . . . ,x β }. A binary bit data symbol is spread by a chaotic signal with the bit duration T b . Thus, we have T b = βT c , where T c is the time between each chaotic sample (chip). The transmitted sample functions are s 1 = √ E bx and s 2 = √ E bx , displaying symbols "0" and "1", respectively. The corresponding signal vectors are (s 11 s 12 )=( √ E b 0) and (s 21 , where E b stands for the average energy per bit. With two basis functions, the receiver should be configured with at least two correlators. The message is detected by correlating the received signal with two reference signalsx andx , and forming the corresponding observation signals (so-called decision variables) D b1 and D b2 . If D b1 > D b2 , then the decision is "1", and if D b1 < D b2 , then the decision is "0". Consider now the output of a correlator under a noisy channel. In a coherent CSK system, the reference signalsx andx are derived from the noisy received signal (s b + n). The decision variable D b1 , at the output of the correlator, can be written as where T s is the synchronization transient time for each bit duration. Note that D b1 is a random variable, whose mean value depends on the E b and the quality of the regenerated reference signalx . If a perfect synchronization is maintained throughout the transmission, we have T s = 0,x =x, andx =x. In this case The chaotic basis functionx is different in every transmission, and the variance of T b 0x 2 dt causes detection error. This variance can be reduced by increasing the bit duration T b . Alternatively, it can be zeroed by modifying the generated basis functions, such that the transmitted energy E b is kept constant. A constant E b is achievable by normalizing the basis functions before each symbol transmission, such that The second term on the right-hand side of Equation (3) results in the cross-correlation estimation problem. It can also be reduced by selecting long chaotic signals or using Walsh functions [9]. By choosing orthonormal basis functions, the estimated symbol s b1 can be written as In contrast with the traditional demodulation methods that often use synchronization (coherent schemes) or delayed reference transmission (non-coherent schemes) for reference recovery, the proposed DLCK utilizes DL for training and recovery of the reference signals x andx . By using DL, it not only mitigates the need for chaotic synchronization during transmissions, but also can reduce the effect of the last term in Equation (4) using an indirect channel estimation method. It should be noted that our main focus is not on the auto and cross-correlation problems, or selecting proper chaotic maps. Therefore, we have chosen the Chebyshev map and logistic map to make chaotic sequences simpler in terms of generating.

SIMO DLCSK System Model
For the general architecture of the SIMO DLCSK system, refer to Figure 1. The proposed design involves two phases. In the first phase, the chaotic sequences are transmitted through an offline training phase under different channel conditions, and then the deployment/test phase initializes the modulated data transmission.

Chaotic Signals
We consider discrete-time chaotic signals generated by the Chebyshev map and the Logistic map for "0" and "1", respectively. The chaotic Chebyshev map generated by the second order Chebyshev polynomial function is used, which can be written asx t+1 = 1 −x 2 t [68]. Another chaotic map is the Logistic map, which is generated using a recursive function, [69]. The parameter ρ is called the bifurcation parameter, and the values of interest for ρ are in the interval [1,4]. For 3.57 ≤ ρ ≤ 4, the generated sequence is non-periodic and non-converging [70].
In this study, we evaluate the effect of the changes in ρ on the classifier's performance. The parameter changes can be intentional (to achieve higher accuracy), or unintentional (due to environmental conditions) [71]. When ρ < 3.57, the generated signals by Logistic map show a periodic behavior and a classifier can easily separate it from a chaotic signal, (e.g., a signal generated by the Chebyshev map). There is a trade-off between the classification accuracy and the security. Since a periodic waveform can bring negative effect on the security, we select ρ > 3.6 for a case that definitely guarantees the chaotic dynamics. The chaotic maps have zero mean and unit variance, i.e., denotes the probabilistic expectation operator.

Transmitter
The transmitter structure is similar to conventional coherent CSK transmitters. Thus, the proposed DLCSK demodulator can easily work with existing CSK transmitters. In the training phase, the transmitter generates two sets of the chaotic signals, {x (n) } N n=1 and {x (n) } N n=1 using the Chebyshev and the Logistic maps, respectively. Each one of these sets includes the number of N chaotic signals with length β samples, i.e., In other words, we use two sets of the chaotic signals, each with length S = Nβ samples, where N and β are the number and the length of the chaotic signals, respectively. All of the chaotic signalsx (n) andx (n) with known binary labels j (n) ∈ {0, 1} are transmitted repeatedly to train the receivers. All the M antennas obtain the corresponding altered signals, i.e.,ȓ to train the NN through the supervised learning framework. Therefore, training set of the mth DLCSK receiver can be expressed as where 2N is the number of chaotic signals in the training set. Assuming a Rayleigh fading channel, the r m,t (tth sample of the vector r m ) can be modeled as a complex-valued random variable, i.e., r m,t = (r m,t ) + (r m,t ), where the operators (.) and (.) represent the real and imaginary parts of a complex number, respectively. Therefore, the vector r m can be separated into two vectors, i.e., r m = [ (r m ), (r m )], before being fed into the classifier, and the training set can be rewritten as Once the DL-based receiver is trained, it can be used for online demodulation and data detection. In the test/deployment phase, the DLCSK modulator maps a transmission bit b (z) ∈ {0, 1}, (1 ≤ z ≤ Z), to a chaos waveform, where b (z) denotes z th transmission bit and Z is the number of data bits. Thus, in the test phase, the transmitter generates two sets of the chaotic signals, i.e., {x (z) } Z z=1 and {x (z) } Z z=1 using the Chebyshev and the Logistic maps, respectively. Each of these sets includes the number of Z chaotic signals with length β samples, i.e., (1), and depending on the current bit b (z) , the signal s Since in every practical communication system, the chaos generator circuits may operate under different environmental conditions, it is essential to consider a parameter mismatch between the training and testing phases. To evaluate the system's generalization capability and robustness, we use different parameter settings to achieve Hence, to generate Logistic maps, different bifurcation parameters are chosen as ρ train and ρ test . In addition, to generate Chebyshev maps, different initial states are chosen for the training and the deployment phases, i.e.,x 1 =x 1 .
The transmission filter (or pulse shaping filter) is commonly used in communication systems. This filter can take different forms, such as the Gaussian filter or raised-cosine filter [72]. In this paper, we consider a rectangular pulse of unit amplitude on [0, T c ], where T c represents the chip time. The noise power can be restricted by a receiving filter at the receiver side. Note that, these filters are not our main focus.

Channel Model and Estimation
In an AWGN channel, a one-dimensional noisy version of the transmitted signal s b can be observed at the receiver side. The mth receiver obtains the altered signals r m and makes a decision about the transmitted bit. In addition to the Gaussian noise effect, many other stochastic phenomena may occur in a practical communication channel. Conventional channel estimation approaches are sensitive in terms of the quality of the pilot signals. One way of augmenting DL models is through the use of learning channel variations [73]. If the LSTM-based classifier is trained using a dataset that contains signals transmitted under different channel conditions, the classifier will be resilient to channel changes, eliminating the need for instantaneous channel estimation [74,75]. In this paper, we assume that the channel changes from one signal transmission to another, and the transmitted signal acts as a pilot that carries channel information. Thus, the NN can simultaneously obtain different chaotic maps and estimate the statistical distribution of the fading.
Consider a multi-path fading channel, with L independent paths, where the channel coefficients follow a Rayleigh distribution. Therefore, the probability density function (pdf) of the channel coefficient α can be given as [76], where δ > 0 is the scale parameter of the distribution representing the root mean square value of the received voltage signal. Considering a multi-antenna system with the number of M antennas, if the tth transmitted sample is shown by s b,t , the tth received sample at the mth antenna can be modeled as where α m,l represents the channel gain of lth path between the transmit antenna and mth receive antenna, and τ m,l is the delay of the lth path. L is the number of paths, and n m,t is independent noise at each antenna, which is assumed to be additive white Gaussian noise with zero mean and variance N 0 /2. Since we assume that the channel changes from one signal transmission to another, the channel gain α (n) m,l and training SNR σ (n) tr changes after the nth channel realization. In particular, σ (n) tr is a Gaussian random variable, such that σ (n) tr ∈ [σ tr,min , σ tr,max ], where σ tr,min and σ tr,max are optional SNR values. The goal of training process is to train a NN with a complex-valued vector r m . The input complex values are split into real and imaginary parts, i.e., r m = [ (r m ), (r m )], before being fed into the classifier. Therefore, we have two feature vectors containing the channel coefficients and stochastic phases. Later, a Softmax layer estimates probability vectors p n,j from input distribution, for the nth observation, where j shows the possible classes (i.e., j ∈ {1, 2}), and optimizes the cross-entropy cost function in Equation (13). In the test phase, we use this trained network to detect unknown inputs.

Receiver
We consider multiple antennas and LSTMs at the receiver end to establish a SIMO design and to obtain a diversity gain. This architecture relies on an ensemble method to fuse several classifiers with the goal of increasing the classification accuracy. In the following, we introduce the structure of a single LSTM-based classifier. The proposed classifier has five base layers: Sequence input layer, LSTM/BiLSTM, fully connected layer (size 2), Softmax, and Classification layer. The sequence input layer is only used to fetch sequential input values of length 2β. The adopted LSTM cell (unit) is shown in Figure 2. The forget gate f t determines how much of the current cell state should be forgotten, and the output gate o t controls which part of the information should be sent to the output. C t−1 and C t , respectively, show the state value of the memory unit at the previous step and current step. Then, h t−1 and h t indicate the output of the previous and the current states, respectively, whereas r t and¯represents the current input and sigmoid function, respectively. Equation (10) illustrates the LSTM cell calculation process [77]. The forget gate decides which information will be remembered or forgotten based on the last hidden layer output h t−1 and the current input r t . The memory cell value C t is determined by the current value C t , its own state C t−1 , input, and forget gates. The operator ( * ) represents element-wise matrix multiplication, while (·) denotes point multiplication. w represents the weight and b is the bias parameter.
In this work, motivated by the features of LSTM, a Bidirectional (BiLSTM) arrangement is implemented for classification tasks. The hidden state of BiLSTM at times t can be calculated by the weighted sum of the forward hidden state − → h t and the backward hidden state ← − h t as follows: where w t and v t denote the weights corresponding to − → h t and ← − h t , respectively. The number of hidden units indicates the number of BiLSTM units that need to be placed in the hidden layer of the network.
The fully connected layer receives the output of the BiLSTM layer in order to increase the stability of the output by performing more non-linear operations. There are two fully connected layers for two output classes. The Softmax layer is an activation function that calculates a probability for each sequence and sends results to the next layer. This layer contains two nodes, which is the same as the number of output classes. The utilized Softmax function can be written as [78], where γ(U t ) j = p n,j is the probability that vector U t is a member of jth class (j ∈ {0, 1}), and I = {0, 1} is a set of all possible classes. In other words, p n,0 and p n,1 , (1 ≤ n ≤ N), represents the probability that the transmitted chaotic signal is "0" or "1", respectively. The goal of the training process is to minimize the categorical cross-entropy loss function, where a t is the mini-batch size, θ indicates set of network parameters corresponding to the different layers, p n,j is the Softmax's layer output probability for output class j and observation n, and p n,j ∈ {0, 1} represents the binary indicator if class label j is the correct classification for observation n. A popular algorithm to obtain θ is the Stochastic Gradient Descent (SGD) method [79], which starts with a random initial value θ = θ 0 , and iteratively updates θ as where η > 0 is the learning rate, andL(θ k ) is an approximation of the loss function which is computed for a random mini-batch of training examples of size a t at each iteration. Through offline training, a network with optimized weights and biases that can be used for online signal demodulation is formed. The proposed training algorithm is summarized in Algorithm 1. Train mth NN-based receiver including: -Sequence input layer; -LSTM/BiLSTM layer; -Fully connected layer; -Softmax layer; -Classification layer; 8: End of Training.

Decision Combining Rule
Reliable communication over multi-path channels highly depends on the condition of the paths, and the probability of deep fade. Spatial diversity is used in conventional coherent communication systems for combating the destructive effects of small-scale fading, and thereby for improving reliability. However, these receivers are very sensitive to the accuracy of the channel estimation process. Since the proposed classifier is trained under different channel conditions, the receiver does not require complex channel estimation techniques or soft data combining methods, such as Equal Gain Combining (EGC) and Maximal Ratio Combining (MRC) [80], for data detection. We can combine hard outputs of several classifiers to achieve a diversity gain through a simple decision rule.
Based on the received vector r t,m , each of the LSTMs can produce a local decision and report this decision to a Fusion Center (FC), which makes the final decision. There are several methods of fusing the decisions of the classifiers, such as majority voting and ensemble averaging [81]. In this paper, having the binary-valued decisions, the FC applies the majority voting fusion rule to generate the ultimate decision. The class with the highest overall output is selected as the ultimate decision. Mathematically, the decision class O(r) can be calculated as [82] where M is the number of classifiers, j ∈ {0, 1} denotes the jth class, C m (r) represents the output of the mth classifier for the received vector r, and Ø j , m is a binary characteristic function that can be defined as

Simulation Results
In this section, we first provide a comparison between the BER performance of the Single-input Single-output (SISO) DLCSK and the conventional DCSK over AWGN and multi-path Rayleigh fading channels. Then, evaluation of various parameters such as bifurcation parameters and the number of antennas on the system performance takes place.
In all simulations the classifiers are trained only at a limited SNR range, i.e., σ (n) tr ∈ [σ tr,min , σ tr,max ] dB, while tested over a wide range of E b /N 0 values greater than 0 dB.
In order to generate chaotic signals, we choose two discrete time recursive functions, i.e., Chebyshev and Logistic maps, which have been used extensively in practical communication systems [40,83]. The transmitter creates two sets of chaotic signals using these two maps during the training phase, according to the values in Table 1. For example, in case 1, each of these sets includes the number of N = 2000 chaotic signals, each with length β = 50 samples. In other words, we use two sets of the training chaotic signals, each with total length S = 50 × 2000 = 10 5 samples. All of the chaotic signals have equivalent class labels as j ∈ {0, 1}. The receiver obtains the corresponding altered signals and forms the final training set as input of the NN. Therefore, after the number of 2 × 2000 = r 4000 signal transmissions, we will obtain a training set, consisting of 4000 received signals along with their corresponding class labels j. In the test phase, a binary "0" will be sent, transmitting a Chebyshev map, and if "1" is to be sent, a Logistic map is transmitted.
In our simulations, to evaluate the system's generalization capability and robustness against parameter mismatch, we use different parameter settings for the training and testing phases. Therefore, to generate Logistic maps, the bifurcation parameters are, respectively, chosen as ρ train = 3.6 and ρ test = 3.3 for the training and the deployment phases, with an initial state asx 1 = 0.3. To generate Chebyshev maps, initial states are chosen as x 1 = 0.3535 andx 1 = 0.3 for the training and the deployment phases, respectively.
Predefined functions of the MATLAB Neural Network Toolbox can be used to define an LSTM network and specify training options, including Learning Rate (η), and the number of hidden layers (H). In all experiments, the learning rate is set to η = 0.01. The other DL parameters are selected based on the SIMO DLCSK parameters. Although we consider the AWGN and Rayleigh fading channels, SIMO DLCSK can also be applied to any other channel model as well. The considered network parameters are listed in Table 1.  The training accuracy represents the classification accuracy of each mini-batch during the training process. The smoothed training accuracy is less noisy than the training accuracy and makes it easier to observe trends. The training loss shows the value of the loss function, i.e., the categorical cross-entropy, on each mini-batch. The results show that the loss function converges rapidly within about 100 iterations. When the DLCSK system is trained at relatively lower SNRs (For instance, when σ (n) tr ∈ [11,15] dB), training accuracy converges more slowly. Since, the chaotic signals contain more stochastic features, the fluctuations of the curve and difficulty of training increases. However, for σ (n) tr ∈ [11,15] dB, the model can achieve a relatively good accuracy rate, and converge to a small final error. Figure 4 shows the simulated results of the BER performance of the DLCSK, noncoherent DCSK, chaotic switching CSK, and antipodal CSK over the AWGN channel. It is noteworthy that the simulated coherent CSK systems are plotted assuming a perfect chaotic synchronization and only provide benchmark data for evaluation. Antipodal CSK is a chaotic modulation scheme with one basis function that can theoretically achieve the BER performance of binary phase shift keying (BPSK) under AWGN channels. In our simulations, the Logistic map is used for antipodal CSK, such that if "0" is to be sent, a chaotic signalx is transmitted, and if "1" is to be sent, -x is transmitted. The results are also compared with those of chaotic switching, a special case of CSK with two basis functions, in which the transmitted samples are obtained from the Chebyshev and the Logistic maps. The BER curve of the chaotic switching scheme is also simulated assuming a correlation receiver with perfect chaotic synchronization.   The proposed SISO DLCSK scheme shows a gain compared to conventional DCSK in the lower SNRs, when the DLCSK system is trained at relatively lower SNR values (Training SNR σ (n) tr ∈ [11,15] dB). Since the auto and cross correlation properties of the chaotic signals are similar in this experiment, according to Equation (3), this gain comes from reducing the third term, i.e., T 0 nx dt. When the training process is performed under different channel conditions, NN can indirectly estimate noise distribution, for use in iterative minimization of the cross-entropy cost function. As an important result, DLCSK shows a more robust behaviour in the test phase.

BER Performance under AWGN Channel
When σ tr is relatively high, the LSTM can only grasp the clean signal. For proper training, the SNR value must help the LSTM learn both clear and noisy samples. The results show that under the AWGN channels, when the σ (n) tr ∈ [19,23] dB, the performance of the SISO DLCSK is close to the conventional DCSK system. Therefore, for better BER performance in high-SNR conditions, the receiver should be trained at higher SNRs. These results seem trustworthy because the above-mentioned training options allow us to design a receiver with a flexible data rate depending upon specific channel conditions. Figure 5 depicts Monte Carlo simulations of the BER performances obtained from the SISO DLCSK system for β = 50 under an AWGN channel. The DLCSK system is trained over a limited SNR range, i.e., [19,23] dB, and tested over the whole SNR range. The other parameters for both case 1 and 2 are selected according to Table 1. The results show that the BER performance of DLCSK has a low sensitivity to changes in the hyper-parameters, such as the number of training samples S and the number of hidden units H. There is a trade-off between security and classification accuracy. When a system is trained with ρ train = 3.6, using ρ test = 3.6 results in more non-periodic behavior. However, reducing the ρ test may result in a better BER performance.

Confusion Matrix, Sensitivity, and Specificity
We provide the confusion matrix of the proposed method, which helps in analyzing the performance of our classification algorithm. Figure 6 depicts an example of two confusion matrices with 10,000 chaotic symbols for AWGN channels for β = 50, σ (n) tr ∈ [19,23] dB, and test SNRs = {16, 23} dB. In Figure 7, the sensitivity and specificity measures extracted from the confusion matrix are introduced. Both are statistical measures for the performance of a binary classification test that are widely used in the literature [84,85]. The sensitivity or True Positive Rate (TPR) measures the proportion of Logistic maps that are correctly identified. The specificity or True Negative Rate (TNR) measures the proportion of Chebyshev maps that are correctly identified. The terms "True Positive (TP)", "False Positive (FP)", "True Negative (TN)", and "False Negative" address the correctness of a classification test. For example, if the condition is sending the signals generated by the Logistic map, "TP" means "correctly predicted as Logistic map", "FP" means "incorrectly predicted as Logistic map", "TN" means "correctly predicted as Chebyshev map", and "FN" means "incorrectly predicted as Chebyshev map".  Figure 8 shows the sensitivity and specificity of different bifurcation parameters. The sensitivity curve in this test shows how capable LSTM can classify samples that are generated by the Logistic maps. Sensitivity can also be referred to as the recall or hit rate [86]. It is the percentage of true Logistic maps out of all samples generated by the Logistic maps i.e., TP/(TP + FN) . The specificity is a measure of how well the test can identify true Chebyshev maps, which can be expressed as TN/(TN + FP). For high SNRs (i.e., SNR > 16 dB), even using a single LSTM-based classifier results in sensitivity and specificity rates > 95%. However, there is a trade-off between sensitivity and specificity for low SNRs, where the curves show low sensitivity and high specificity. For example, when SNR = 12 dB, ρ train = 3.6, and ρ test = 3.3, the sensitivity of a single LSTM-based classifier is 65.69% and its specificity is 99.96%, its false negatives and false positives rates are 34.31% and 0.04%, respectively.  Sensitivity values contain useful information about the receiver's performance that can be utilized for different goals. Here, we evaluate the effect of the changes in bifurcation parameter on the classifier's performance. For example, ρ train = 3.6 and ρ test = 3.3 leads to a better classification accuracy than other parameter settings for SNR < 14 dB. However, with ρ test = 3.3, the Logistic map shows a periodic behavior and this change can bring negative effect on security. Figure 9 presents the performance obtained for case 1 using different numbers of training epochs (E). The system achieves its peak when E = 20. This Figure also measures the sensitivity to the number of epochs. It can be observed that the SISO DLCSK system is relatively robust to the number of epochs. For example, for a target BER performance of 10 −2 , there is a 1 dB gap between the worst-case and the best-case scenarios. Figure 10 illustrates a comparison between the BER performance of the SISO DLCSK and conventional DCSK over multi-path Rayleigh fading channels, for β = 50 and σ (n) tr ∈ [19,23] dB. We consider two cases corresponding to different path gain ratios and different path delays. In the first case, a two-path channel is considered (L = 2) in which the two paths have similar average power gain. In this case, the average power gain in each path is 0.5, (i.e., E(α 2 1 ) = E(α 2 2 ) = 0.5), with τ 1 = 0, and τ 2 = 2. In the second case, three paths (L = 3) are considered with different average power gains. The average power gains are E(α 2 1 ) = 1/7, E(α 2 2 ) = 2/7, and E(α 2 3 ) = 4/7 with τ 1 = 0, τ 2 = 3, and τ 3 = 6. The average power gain of the third path is 3 dB below the second path, and for the second path it is 3 dB below the first path. When the receiver is already accustomed to chaotic signals, we do not need to transmit a reference signal, it means less energy will be used to transmit one bit. As shown in Figure 10, the SISO DLCSK overcomes the BER inefficiency of the existing DCSK under Rayleigh fading channels. For further benchmark comparisons, we consider the antipodal CSK system. The BER performance of DLCSK is close to antipodal CSK when E b /N 0 ≤ 10 dB, and both systems have similar performance when E b /N 0 is more than 10 dB.   Figure 11 compares the BER performance of the SIMO DLCSK with the SISO-DLCSK under Rayleigh fading channels. The number of receive antennas is M = {1, 3, 5}, and the spreading factor is β = 50. The results show that the SIMO DLCSK attains a better BER performance due to the diversity gain. For example, when M = 1, to achieve the target BER = 10 −2 , the required SNR will be about 20 dB. If we increase the number of antennas, i.e., M = 5, the required SNR to achieve the BER = 10 −2 will only be about 15 dB. Figure 11 also compares the BER performance of the SIMO DLCSK with a recently published LSTMaided OFDM DCSK in [40]. The LSTM-aided NN calculates the correlations between chaotic modulated OFDM-DCSK signals in order to retrieve the data. For a fair comparison, we consider a SIMO DLCSK with M = 5 and an LSTM aided OFDM DCSK with K = 5, where K indicates the number of independent sub-channels in such a system. The SIMO DLCSK with M = 5 independent sub-channels can be fairly compared to LSTM-aided OFDM DCSK with K = 5 sub-channels. The LSTM-aided receiver sends reference signals that do not carry useful information. The proposed DLCSK scheme shows an expected gain compared to the LSTM-aided OFDM DCSK because of using indirect channel estimation and reducing the burden of the additional reference sample transmission.

Conclusions
This paper represents an attempt to develop a new generation of chaos-based communication systems based on DL. We have introduced a trainable DLCSK receiver, which does not need any reference signal transmission or chaotic synchronization. Thus, the DLCSK is more practical, in terms of reliability and compatibility with modern communication infrastructure. A multi-antenna design is presented to achieve a diversity gain. The main objective of the multi-antenna receiver is to improve the classification accuracy of the individual classifiers. Simulation results verify that the SIMO DLCSK system provides an excellent BER performance without channel estimation and complex combining modules at the receiver side. The proposed architecture is an appealing candidate for next generation wireless communications, such as massive MIMO systems and cloud/edge-based communications. Funding: This work is supported by the Tier2 Canada research chair entitled 'Towards a Novel and Intelligent Framework for the Next generations of IoT Networks.